Scientific Python antipatterns advent calendar day sixteen

For today, an antipattern that’s easy to write by accident: modifying a list while iterating over it. As a reminder, I’ll post one tiny example per day with the intention that they should only take a couple of minutes to read.

If you want to read them all but can’t be bothered checking this website each day, sign up for the mailing list:

Sign up for the mailing list

and I’ll send a single email at the end with links to them all.

Modifying a list while iterating

When we say “modifying”, we usually mean changing the length of the list (removing or inserting items). That’s the dangerous case.

Some kinds of modification are totally fine. For example, updating items in-place doesn’t change the length, so it behaves predictably:

fruits = ["apple", "banana", "orange"]

for i, fruit in enumerate(fruits):
    fruits[i] = fruit.upper()

fruits
['APPLE', 'BANANA', 'ORANGE']

This kind of in-place update is common (and safe) because the loop still visits the same number of positions.

The problem starts when we remove items as we go. Suppose we want to remove all oranges from our list:

fruits = ["apple", "orange", "banana", "orange", "orange", "strawberry"]

for fruit in fruits:
    if fruit == "orange":
        fruits.remove(fruit)

That code looks reasonable, but if we look at the resulting list:

fruits
['apple', 'banana', 'orange', 'strawberry']

We can see that something has gone wrong; we still have some oranges.

If we add a bunch of extra printouts, along with the index, we can see where the logic goes wrong step by step.

fruits = ["apple", "orange", "banana", "orange", "orange", "strawberry"]

for index, fruit in enumerate(fruits):
    print(index, fruits, fruit)
    if fruit == "orange":
        fruits.remove(fruit)
0 ['apple', 'orange', 'banana', 'orange', 'orange', 'strawberry'] apple
1 ['apple', 'orange', 'banana', 'orange', 'orange', 'strawberry'] orange
2 ['apple', 'banana', 'orange', 'orange', 'strawberry'] orange
3 ['apple', 'banana', 'orange', 'strawberry'] strawberry
  • the element at index 0 is apple, so nothing happens
  • the element at index 1 is orange, so it gets removed
  • now we are at index 2, but the banana that was previously at index two is now at index 1, so it gets skipped completely. Instead we get the next orange at index 2, which gets removed.
  • at index 3, the orange that was previously there gets skipped, because it has now been shifted back to index 2. So we get strawberry
  • finally, Python recognises that we are at the end of the list so the loop stops

If we work our way through this, we can see the logical problem; every time we remove an element from the list, the following element will be skipped completely.

It’s easy to see why this might be a very tricky bug to fix - the code works as long as we never have two oranges in a row. So if we test the loop using small datasets:

fruits = ["apple", "orange", "banana"]

for fruit in fruits:
    if fruit == "orange":
        fruits.remove(fruit)

fruits
['apple', 'banana']

it will appear to work perfectly - the bug will only become apparent on longer lists that are likely to have adjacent oranges.

A much better solution will be to reverse the logic and create a new list:

fruits = ["apple", "orange", "banana", "orange", "orange", "strawberry"]

not_oranges = []
for fruit in fruits:
    if fruit != "orange":
        not_oranges.append(fruit)

not_oranges
['apple', 'banana', 'strawberry']

If for some reason we really want to change the original list, we could also iterate over a copy while modifying the original:

fruits = ["apple", "orange", "banana", "orange", "orange", "strawberry"]

# calling list() creates a copy
for fruit in list(fruits):
    if fruit == "orange":
        fruits.remove(fruit)

fruits
['apple', 'banana', 'strawberry']

Or if we don’t want to incur the extra memory cost of making the temporary copy, we can get away with changing the original list if we iterate in reverse:

fruits = ["apple", "orange", "banana", "orange", "orange", "strawberry"]

for index in range(len(fruits)-1, -1, -1):
    fruit = fruits[index]
    if fruit == 'orange':
        del fruits[index]

fruits
['apple', 'banana', 'strawberry']

This is much more complicated than either of the other two options, and hence much easier to accidentally introduce bugs.

In summary: changing list elements during iteration is fine; changing list length during iteration is not.

One more time; if you want to see the rest of these little write-ups, sign up for the mailing list:

Sign up for the mailing list